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Abstract 

Background: Ralstonia solanacearum is a soil-borne beta-proteobacterium that causes bacterial wilt disease in many 
food crops and is a major problem for agriculture in intertropical regions. R. solanacearum is a heterogeneous species, 
both phenotypically and genetically, and is considered as a species complex. Pathogenicity of R. solanacearum relies on 
the Type III secretion system that injects Type III effector (T3E) proteins into plant cells. T3E collectively perturb host cell 
processes and modulate plant immunity to enable bacterial infection. 

Results: We provide the catalogue of T3E in the R. solanacearum species complex, as well as candidates in newly 
sequenced strains. 94T3E orthologous groups were defined on phylogenetic bases and ordered using a uniform 
nomenclature. This curated T3E catalog is available on a public website and a bioinformatic pipeline has been 
designed to rapidly predict T3E genes in newly sequenced strains. Systematical analyses were performed to detect 
lateral T3E gene transfer events and identify T3E genes under positive selection. Our analyses also pinpoint the RipF 
translocon proteins as major discriminating determinants among the phylogenetic lineages. 

Conclusions: Establishment of T3E repertoires in strains representatives of the R. solanacearum biodiversity allowed 
determining a set of 22 T3E present in all the strains but provided no clues on host specificity determinants. The 
definition of a standardized nomenclature and the optimization of predictive tools will pave the way to understanding 
how variation of these repertoires is correlated to the diversification of this species complex and how they contribute 
to the different strain pathotypes. 
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Background 

Ralstonia solanacearum is a widely distributed soil-borne 
phytopathogen belonging to the beta subdivision of 
Proteobacteria [1]. It causes lethal bacterial wilt of more 
than 200 plant species, including economically important 
crops [2,3]- Among the pathogenicity determinants of this 
bacterium, the Type III Secretion System (T3SS) plays a 
crucial role because mutants unable to produce this spe- 
cialized secretion machinery are unable to cause disease 
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on plants [4]. This T3SS ensures the direct translocation 
of Type III effector (T3E) proteins from the bacterium to 
the plant cell cytosol [5,6]. These T3E are presumed to 
perturb host cell processes and modulate plant innate 
immunity to allow bacterial infection [7] . 

Phylogenetic analyses of Ralstonia strains causing wilt 
diseases revealed an extensive diversity [8,9] and this group 
of organisms is now commonly called the R. solanacearum 
species complex (RSSC hereafter) [10]. This species com- 
plex includes strains with broad and narrow host ranges 
with different geographic origins. Based on phylogenetic 
analyses and on comparative genomic hybridization, the 
RSSC has been classified in four phylogenetic groups called 
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phylotypes, which reflect their origins as follows: Asia (phy- 
lotype 1), the Americas (phylotype 2), Africa (phylotype 3) 
or Indonesia (phylotype 4, which includes Ralstonia syzygii 
and the banana blood disease bacterium BDB) [8,11,12]. 
To date, 14 strains belonging to the RSSC have been 
completely sequenced. 

Pioneering studies have established that T3E repertoires 
are highly variable among strains and shape the host range 
of bacterial pathogens [13,14]. First exhaustive inventories 
of RSSC T3E using different in silico or experimental ap- 
proaches were made in phylotype 1 strains GMIIOOO [5,7] 
and RSIOOO [6,15]. GMIIOOO and RSIOOO have almost 
identical repertoires that comprise 72 and 74 T3E for 
which T3SS-dependent plant cell targeting have been 
experimentally validated in RSIOOO [6,15]. A feature 
of these repertoires is the existence of multigenic T3E 
families [7]. Functional studies have been carried out 
on members of the Gala family, which are proteins with 
F-box and Leucine Rich Repeat domains collectively 
required for full virulence [16-18], and members of the 
PopP family, which includes the avirulence proteins 
PopPl [19] and PopP2, the latter possessing acetyl- 
transferase activity [20-22]. Recently a functional ana- 
lysis of the AWR family demonstrated that some AWR 
T3E induce cell death necrotic reactions on plants and 
are required for full virulence [23]. 

The genome sequence data from strains representative 
of the biodiversity of the RSSC opens the way towards 
understanding the evolutionary processes that structured 
their T3E gene repertoire. This will also provide clues 
towards defining what makes a given strain more aggres- 
sive than others on a specific host. However such com- 
parative genomic approaches are actually hampered by 
the fact that T3E inventories in multiple strains have not 
been accurately established: several T3E genes have been 
overlooked by automatic annotation programs and/or 
have been incorrectly predicted. Moreover, the lack of a 
unified nomenclature for RSSC T3E is confusing for a 
non-expert since many T3E genes from RSSC strains 
have different names in the published literature (Pop, 
Avr, Brg, Rip, Hpx or Lrp proteins). This doesn't help 
the already difficult task of identifying orthologous and 
paralogous genes in strains harboring between 46 to 71 
T3E genes. 

This work presents an integrative and comprehensive 
database for the T3E of the RSSC. This database is a 
compendium of manually re-annotated genes across 11 
sequenced strains and ordered with a novel and unifying 
nomenclature. This database is publicly available for 
browsing and retrieving data and information. Our ana- 
lyses on this particular gene set at the forefront of the 
interaction between the bacteria and its host, provides 
new insight into their evolutionary history and their 
potential contribution to host specificity 



Results and discussion 

Ralstonia solanacearum T3E database 

Inventory and re-annotation of T3E genes in ttie RSSC 

Our goal is to provide a comprehensive and an as 
exhaustive as possible inventory of T3E in the RSSC 
as a public database from which curated information 
can be retrieved. To this end, we manually curated and 
compiled the T3E genes from eleven sequenced strains 
representative of the genetic diversity of the RSSC (see 
Methods). The workflow of the retrieval and annotation 
of the T3E genes from the RSSC genomes as well as 
the main outputs of this analysis are shown in Figure 1. 
The inventory of T3E in the published RSSC genome 
sequences was primarily based on homology searches 
with the established repertoires of strains GMIIOOO [7] 
and RSIOOO [6]. Identification of additional T3E was 
conducted using criteria defined previously [5] to mine 
the GMIIOOO genome: (i) homology to known T3E in 
other bacterial species (ii) presence of a hrpu box in 
the promoter region since 52/70 T3E gene promoters 
harbor this cw-regulatory element in GMIIOOO [24], 
(iii) existence of specific amino acid distribution biases 
in the 50 N-terminal domain [24]. These two latter cri- 
teria were hampered by the fact that many T3E genes 
have wrongly annotated start codons. Hence all the 
genes possessing a putative hrpn box were inspected 
for potential start codon errors before being included 
in the T3E annotation workflow (see Figure 1). This process 
led to the "discovery" of twenty new T3E genes (generating 
42 new gene accessions), and the re-annotation of 34% 
of the existing RSSC T3E genes. Altogether these changes 
affect 39% of the RSSC T3E dataset (841 individual 
entries) submitted or already present in GenBank to date. 

Identification of T3E candidate genes in RSSC strains. 

A mining of the genome of nine RSSC strains from phy- 
lotypes 2, 3 and 4 for previously undescribed T3E gene 
families was performed based on the criteria listed above 
[5]. In this process, we only kept the T3E candidates 
strictly fitting with both criteria (ii) and (iii) described 
above. This search yielded 16 RSSC T3E candidates, for 
which T3SS-dependent translocation is not yet demon- 
strated. These 16 hypothetical T3E gene families are 
listed in the Additional file 1 as well as in the RSSC-T3E 
database. Most of the corresponding genes did not dis- 
play homology to any other known proteins, except for 
families RSSC-T3E-Hyp5, Hyp6 and Hyp7 having ho- 
mologues only in Acidovorax spp or Xanthomonas spp, 
which are both plant pathogenic bacteria. 

Pseudogenes 

In many cases, T3E genes appeared to have frameshift 
mutations or to be split into several independent open 
reading frames on the assembled genomes. This could 
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Figure 1 Workflow for T3E identification in RSSC strains and main outputs of the analysis. A Flowchart for identification and manua 
annotation of T3E genes in the RSSC B. T3E statistics for each of the curated strains in this study. \ Number of T3E genes with a potential 
frameshift mutation within the coding sequence. ^. Number of annotated pseudogenes (incomplete or disrupted coding frames). ^. Number of 
hypothetical (candidate) T3E genes. Total of estimated number of T3E genes (= number of T3E + frameshifted T3E + hypothetical T3E). ^. Not 
determined since the genome sequence of strain RSI 000 is not available. 
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be due to mutations leading to gene inactivation or, 
more probably, to sequence and assembly errors in the 
available genome sequences. It should be noted that 
there are important differences in terms of quality in the 
available assembled genomes (see Methods). In some 
other cases, genome sequence gaps resulted in incom- 
plete T3E gene prediction. Many genes encoding T3E 
with internal repeats are often predicted as truncated or 
incomplete, probably due to the difficulty to assemble 
repeat-containing short sequence reads (Next Generation 
Sequencing techniques). Frameshift-mutated and incom- 
plete T3E genes were included in the RSSC-T3E database 
and are distinguished by the prefrx fs ('frameshift') before 
the gene name. Future re-sequencing should verify the 
current pseudogene status of these genes. 

Probable non-functional pseudogenes are also listed in 
the RSSC-T3E database (with the "pg" prefrx, for pseudo- 
gene). These pseudogenes correspond to genes or gene 
fragments which are either gene remnants, open reading 
frames disrupted by a transposable element insertion or 
frameshift mutated genes confirmed after re-sequencing. 
The number of predicted pseudogenes varies from one 
to eight among the eleven strains analyzed (Figure IB). 
However, the formal distinction between a pseudogene 
and a functional gene is difficult to establish without 
experimental validation [25]. In some cases, the ab- 
sence of specific domains [e.g. RipClcMRis lacking the 
C-terminal half present in other RipCl alleles) raises 
the question of the functionality of the corresponding 
protein. 

The RSSC-T3E database interface 

The dataset corresponding to the lists and expert annota- 
tion of validated and candidate T3E in the 11 sequenced 
strains representative of the 4 RSSC phylotypes were 
compiled in a web interface named "Ralstonia T3E" 
(https://iant.toulouse.inra.fr/T3E) designed to provide 
the user with a convenient and straightforward access 
to all the underlying data. The home page provides a 
synthetic table displaying the distribution of the 94 T3E 
gene families in the RSSC strains under the proposed 
nomenclature (see below). This table summarises for 
each strain whether a gene member is present (in single 
or multiple copies), absent, or is predicted as being not 
functional (pseudogene). A specific colour code also 
indicates genes with putative frameshift mutations. This 
information is also available as a table in the Additional 
file 2. The clickable T3E genes provide a link to multifasta 
files of the curated nucleotide and protein sequences as 
well as view of the corresponding DNA and protein 
alignments [26]. Tab-style navigation provides a link to 
the 16 T3E candidate genes as well as a link to different 
services like "ScanYourGenome" (see hereafter), Pat 
Scan, HMScan and Blast. 



Proposed guidelines for the nomenclature of T3Es in 
RSSC strains 

The recent availability of complete genome sequences 
for a number of RSCC strains has led to a significant 
increase in the rate of T3E discovery. However, the 
absence of a systematic nomenclature has resulted in 
multiple names being assigned to the same T3E gene. 
Some genes were named as hrg (/zr/?£-regulated genes) 
[5] or hpx (/2r/7B-dependent expression) [27] genes based 
on regulation studies/screens or as Rip {Ralstonia injected 
protein) genes [5,6]. We propose the usage of the generic 
term of Rip for renaming all the T3E genes in the RSSC, 
a term previously used after demonstration of the trans- 
location of these effectors into plant cells [5,6]. This 
new nomenclature should follow the rules defined pre- 
viously for naming the P. syringae T3E [28]; such as: 
RipXY#strain) whereiu the gene is indicated by alphabetic 
characters, paralogous genes in numerically characters, 
and the strain in subscript. The proposed attribution 
of this novel nomenclature to known translocated 
RSSC T3E is presented in Table 1 (and Additional file 3). 

After identifying groups of homologous genes by re- 
ciprocal best hit in the curated list of RSSC likely T3E 
genes, we concentrated our effort in grouping the differ- 
ent genes in orthologous groups and naming then ac- 
cordingly. Three situation can occur: (i) a single hit (or 
no hit) in each strain, with conservation of synteny on 
the genome; (ii) a single hit (or no hit) in each strain, 
but with a breach of synteny for at least one of the hom- 
ologous genes; (iii) multiple hits (two or more for at 
least one strain) in different strains. 

In the first case a single orthologous group is defined 
irrespective of the pairwise identity between the ortholo- 
gous genes. This can be exemplified by RipB a single 
gene present in all strains with pairwise amino acid iden- 
tity ranging from 72 to 100%. Another case is RipU also 
a single gene present in all strains with a strict conserva- 
tion of synteny, but with surprising divergent members 
(pairwise amino acid identity ranging from 23 to 100%). 
Even though it is likely that RipU has evolved different 
functions in the different strains, based on the likely 
common ancestral origin suggested by the conservation 
of synteny [29,30], we advocate for keeping a single 
orthology group. 

In the second situation, an apparent single orthologous 
group exists but differences in synteny support a sce- 
nario of gene duplication followed by gene loss or lateral 
gene transfer between strains. Here we favour synteny as 
a ruler for ortholog definition [29,30]. This is exempli- 
fied by RipOl and Rip02, the latter being present only 
in the strain R24, devoid of RipOl. 

Finally when there are strains with two or more par- 
alogous genes, again we favour the synteny rule to iden- 
tify groups of orthology [29]. A careful phylogenetic 



Table 1 List of the T3E genes currently identified in the R. solanacearum species complex and proposal for a unified nomenclature 



Proposed T3E Representative gene member 
family name 

RipAl RSc2139 

RipA2 RSp0099 

RipA3 RSp0846 

RipA4 RSp0847 

RipAS RSp1024 

RipB Rsc0245 

RipCI RSpl239 

RipC2 CFBP2957 RCFBP_mp20032 

RipD RSp0304 

RipEI RSc3359 

RipE2 CFBP2957 RCFBP_mp10565 

RipFI RSpl555 

RipF2 CFBP2957 RCFBP_mp30453 

RipGl RSp0914 

RipG2 RSp0672 

RipG3 RSp0023 

RipG4 RSclSOO 

RipGS RSclSOl 

RipG6 RSC1356 

RipG7 RSc1357 

RipG8 GMR1 5 CMR1 5v4_1 0224 

RipHI RScl386 

RipH2 RSp0215 

RipH3 RSpOieO 

RipH4 Psi07 RPSI07_mp0161 

Ripl RSc0041 

RipJ RSc2132 

RipK CFBP2957 RCFBP_mpl0024 

RipL RSp0193 

RipM RSc1475 

RipN RSpl130 



Former/other name Hop/Xop homolo-gues Functional domain/motif or Function 



AWR1 

RipA, Rip29, Hpx31, 
AWR2 

Rip44, Hpx32, AWR3 
Rip45, Hpx4, AWR4 
Rip56, HpxIO, AWR5 
RipB, Rip2, Hpxl 1 
Rip62 

Rip34, Hpx25, Brg8 
Rip26, Brg9 

PopFI, PopF2, Rip70 

Galal, Rip53 
Gala2, Rip37, Hpx20 
Gala3, Rip28 
Gala4, Ripl 7, Hpxl 5 
Galas, Ripl8, Hpx15 
RipG, Gala6, Ripl 3, Hpxl 3 
Gala7, RipM, Hpxl4 
Gala8 

HLK1, RipIS, Brg19 

HLK2, Rip32 

HLK3, Rip30, Brgl8 

HLK4 

Ripl 

Rip22 

Rip31, Brg22 
Ripl 6, Brg42 
Rip58, Hpx26, Brg44 



HopQ/XopQ 

XopC 

XopC 

HopDAopB 
HopX/XopE 
HopX/XopE 



XopP 
XopP 
XopP 
XopP 

HopZ/XopJ 



Nucleoside N-ribohydrolase 



T3SS translocator 
T3SS translocator 
F-box Leucine-Rich Repeats 
F-box LRR protein 
F-box LRR protein 
F-box LRR protein 
F-box LRR protein 
F-box LRR protein 
F-box LRR protein 



Putative acetyltransferase 
YopJ acetyltransferase domain 
Pentatricopeptide Repeats 

Nudix hydrolase domain 



Evidence for T3SS-dependent secretion 
or translocation 

RipAl [23] 

RipA [5], Rip29 [6] 

Rip44 [6] 
Rip45 [6] 
Rip56 [6] 
RipB [5], Rip2 [6] 
Rip62 [6] 

Rip34 [6] 
Rip25 [6] 

RipFI [6], PopFl [35] 

Rip53 [6] 
Rip37 [6] 
Rip28 [6] 
Ripl 7 [6] 
Ripl 8 [6] 

RipG [5], Ripl 3 [6] 
Gala7 [16], RipM [6] 

Ripl 5 [6] 
Rip32 [6] 
Rip30 [6] 

Ripl [6] 
Rip22 [6] 

Rip31 [6] 
Ripl 6 [6] 
Rip58 [6] 



Table 1 List of the T3E genes currently identified in the R. solanacearum species complex and proposal for a unified nomenclature (Continued) 



RipOl 
Rip02 
RipPl 
RipP2 
RipP3 

RipQ 
RipR 

RipSl 

RipS2 

RipS3 

RipS4 

RipS5 

RipS6 

RipS7 

RipS8 

RipT 

RipU 

RipVl 

RipV2 

RipW 

RipX 

RipY 

RipZ 

RipAA 

RipAB 

RipAC 

RipAD 

RipAE 

RipAFl 

RipAF2 

RipAG 

RipAH 



RSp0323 

ft syzygii RALSY_mp301S9 

RSC0826 

RSc0868 

UW163 [GenBank accession : 
CAF32358.1] 

RSp1277 

RSpl281 

RSc3401 
RSp1374 
RSp0930 
RSC1839 
RSp0296 
RSc2130 

Molk2 RSMK02658 
Psi07 RSPsi07_1850 
RSc3212 
RSpl212 
RSc1349 

Psi07 RSPsi07_1895 

RSc2775 

RSp0877 

RSc0257 

RSpl031 

RSc0608 

RSp0876 

RSp0875 

RSpieoi 

RSC0321 
RSp0822 

ft syzygii RALSY_20037 

RSc0824 

RSc0895 



Rip35, 



5l2 



PopPI, Rip7 
PopP2, Rip8 
PopP3 

Rip53, Hpx23 

Rip54, Hpx24, Brg15, 
Pops 

SKWPl, Rip27, Hpx37 

SKWP2, Rip65, Hpx36 

SKWP3, Rip54 

SKWP4, Rip20, Hpx30 

SKWP5, Rip33, Hpx34 

SKWP6 

SKWP7 

SKWP8 

RipT, Rip25 

Rip59 

Rip12, Hpx29, Brg17 

PopW, Rip24 
PopA, Rip49 
Rip3, Brg23 
Rip57, Brg38 
AvrA, Rip5, Brg45 
PopB, Rip48 
PopC, Rip47 



p72 

p4 

p40 



HopG 

HopG 

HopZ/XopJ 

HopZ/XopJ 

HopZ/XopJ 

HopAA 
HopR 

XopAD 



Hope 



XopAE 

HopZ/XopJ 

HopF 

HopF 



Putative acetyltransferase 
Acetyltransferase 
Putative acetyltransferase 



Heat/Armadi 
Heat/Armadi 
Heat/Armadi 
Heat/Armadi 
Heat/Armad 
Heat/Armadi 
Heat/Armadi 
Heat/Armadi 



llo repeat domain 
llo repeat domain 
llo repeat domain 
llo repeat domain 
llo repeat domain 
llo repeat domain 
llo repeat domain 
llo repeat domain 



Putative cysteine protease 

Ubiquitin ligase domain 
Ubiquitin ligase domain 
Harpin, Pectate lyase 
Harpin 

Ankyrin Repeats 



Leucine-Rich Repeats 

Putative acetyltransferase 

PutativeADP-ribosyltransferase 

PutativeADP-ribosyltransferase 



Rip5 
Ripll 



Rip35 [6] 

Rip7 [6], PopPl [36] 
PopP2 [5], Rip8 [6] 



Rip53 E 
Rip54 K 

Rip27 K 
Rip55 E 
Rip54 E 
Rip20 E 
Rip33 E 



RipT [5], Rip25 [6] 
Rip59 E] 
Rip12 E] 

Rip24 E, PopW [34] 
Rip49 [6], PopA [74] 
Rip3 [6] 
Rip57 [6] 

AvrA [31], Rip5 [6] 
Rip48 [6], PopB [33] 
Rip47 [6], PopC [33] 
Rip72 [6] 
Rip4 [6] 
Rip40 [6] 

Rip5 [6] 
Ripll [6] 
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Table 1 List of the T3E genes currently identified in the R. solanacearum species complex and proposal for a unified nomenclature (Continued) 



RipAl 


RSp0838 


Rip41 






Rip41 [6] 


RipAJ 


RSC2101 


Rip21, HpxIB 






Rip21 [6] 


RipAK 


RSC2359 


Rip23, Hpx28, Brg36 






Rip23 [6] 


RipAL 


UW551 RRSL_02221 


Rip38 




Lipase domain 


Rip38 [6] 


RipAM 


RSc3272 


Brg40 






This work Additional file 3 


RipAN 


RSp0845 


Rip43, Hpx33, Brg33 






Rip43 [6] 


RipAO 


RSp0879 


RipSO, Hpx2, Brg34 






Rip50 [5] 


RipAP 


UW551 RRSL_04655 


Rip60 




Anl<yrin Repeats 


Rip60 [6] 


RipAQ 


RSp0885 


Rip51, Brg35 






Rip51 [6] 


RipAR 


RSpl236 


Rip61 




Ubiquitin ligase domain 


Rip61 [6] 


RipAS 


RSp1384 


Rip66, Hpx9, Brg43 






RipeS [6] 


RipAT 


RSp1388 


Rip57, Brg48 






Rip57 [5] 


RipAU 


RSpl460 


Rip68, Hpx8, Brg45 






Rip58 [6] 


RipAV 


RSp0732 


Rip39, Hpx27, Brg39 


HopAV 




Rip39 [6] 


RipAW 


RSp1475 


Rip69 




Ubiquitin ligase domain 


Rip69 [6] 


RipAXl 


RSc3290 


Brgl3 


HopH/XopG 






RipAX2 


RSp0572 


Rip36, Brg14 


HopH/XopG 




Rip36 [6] 


RipAY 


RSpl022 


Rip55, Hpx21, Brg37 






Rip55 [6] 


RipAZl 


RSpl582 


Rip71 






Rip71 [6] 


RipAZ2 


ft syzygii RALSY_20407 










RipBA 
RipBB 


RSc0227, RSp0228 [pseudogene] 
Psi07 RPSi07_mp0573 




AvrRpml 


Anl<yrin repeats 




RipBC 


CFBP2957 RCFBP_mp30170 






YopJ acetyltransferase domain ^ 
Repeats 


li Ankyrin 


RipBD 


ft syzygii RALSY_20184 




HopAF 






RipBE 


RSI 000 RiplO 


RiplO 


XopAR 




RiplO [6] 


RipBF 


Psi07 RPSi07_2863 




HopV 






RipBG 


Molk2 RSMK00753 




HopAB 


Ubiquitin ligase domain 




RipBH 


Psi07 RPSI07_mp1715 






Shigella flexneri OspD family 




RipBI 


CFBP2957 RCFBP_mp30113 




XopX 






RipTALI 


RScl815 


Rip19, Hpx17, Brgll 


TAL 


Putative transcription factor 


Ripl9 [6] 


RipTPS 


RSp0731 






Trehalose-phosphate synthase 


Manuscript in preparation 



A representative gene mennber for each family is provided (gene nomenclature from strain GMMOOO unless otherwise stated) with other names published in the literature. Homologues T3E from Pseudomonas syringae 
sp. (Hop) or Xanthomonas sp. (Xop) are indicated. The last column lists T3E for which Type 3 secretion system-dependent secretion or translocation was experimentally demonstrated. 
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reconstruction for these homologous genes across the 
whole species complex (Additional file 4) illustrates the 
accuracy of the orthology attributions [30]. These 
phylogenetic trees also highlighted the existence of two 
paralogs in several strains that clearly belong to a clade 
defined as an orthologous group (see Additional file 4, 
for RipA5, RipEl, RipFl, RipGl and RipH2). We believe 
that these paralogs result from strain specific (or group 
of related strains) recent gene duplication. We thus 
choose to name these genes in a way that indicates their 
recent evolution: e.g. RipA5_lMoik2 and RipA5_2MoiK2; 
RipFl lcMRis and RipFl_2cMRi5 etc.. ..The rule of syn- 
teny is conserved since we verified that all these genes 
have indeed a conserved synteny {e.g. RipA5_lMoiK2. 
RipA5_lipoi609. RipA5_luw55i and RipA5_lpo82 have a 
conserved genomic location, as do RipA5_2MoiK2. 
RipA5_2ipoi609, RipA5_2uw55i and RipA5_2po82)- 

Suggested name reassignment of previously characterized 
R. solanacearum T3E. 

Whenever possible the proposed new nomenclature 
conserves the original letter designations used in previ- 
ous annotation e.g RipPl is PopPl [19]; RipP2 is PopP2 
[20]; RipAA is AvrA [31]. In the case of paralogous 
genes, the names are, for instance: RipGl, RipG2, ...to RipG8 
for the GALA gene family [16,17]; RipAl, RipA2, ...to RipA5 
for the AWR family [23]. In a few cases, there is evi- 
dence for recent T3E gene duplications resulting in two 
or more gene copies in a single given strain, e.g. strain 
Psi07 harbors 3 copies of RipGl [17] and 2 copies of 
RipH2: these were renamed RipGl_l, RipGl_2, RipGl_3 
and RipH2_l, RipH2_2, respectively, to differentiate them 
from the other RipH and RipG genes in this strain 
(Table 1). 

In addition, a Rip name is proposed for the 9 T3E pre- 
viously identified as Pop [20,32-36] or Avr [37]. The Pop 
designation is historical and was formerly coined when 
R. solanacearum was known as Pseudomonas solana- 
cearum [38], the "Avr" term was solely used for the 
AvrA avirulence protein identified in 1990 [37]. These 
designations can be confusing because the Pop term has 
also been used to name some Pseudomonas aeruginosa 
T3E [39] and AvrA also refer to an unrelated T3E from 
Salmonella species [40]. 

"ScanYourGenome" a bioinformatic tool for detecting 
T3E orthologs 

In order to swiftly analyse the T3E content of newly pro- 
duced genome sequences, we developed a protocol for 
the identification of putative effector candidates. This 
pipeline is based on a de novo effectome prediction using 
T3E models. Then each candidate is tested using different 
methods with decreasing stringency to assign them to the 
most probable known effector gene (see Methods section). 



This protocol was first tested on reference genomes 
used above for manual annotation of the T3E genes in 
order to calibrate the detection parameters (see Methods) 
before using it for predicting T3E in the recently pub- 
lished draft genomes of strains K60 [41], FQY_4 [42] and 
Y45 [43]. This analysis yielded a prediction of 60, 75 and 
73 potential T3E encoding genes encoded respectively by 
the K60, FQY_4 and Y45 genomes, (Additional file 2). 
The gene model prediction takes into account possible 
frameshifts, also when the gene is shorter than 80% of 
the average length of the other alleles of this Rip gene, 
the predicted gene is tagged as potential pseudogene. 
Both frameshift and pseudogene annotations appear in 
the prediction. This orthology search engine and the 
consequent Rip assignment are available to the commu- 
nity for queries of draft or complete genome sequences. 
For shorter gene sequences a more straightforward blast 
is advised. The advantage of a sliding scale of orthology 
detection is the possibility to unequivocally assign each 
potential T3E gene to a specific orthologous group. 
Whenever a new candidate T3E gene, experimentally 
validated as being secreted or translocated into plant 
cells, will not retrieve an already labelled orthologous 
Rip family, this gene will be assigned the next available 
Rip code. 

Evolutionary dynamics of rip genes 
Classification of paralogous rip genes 

A specific feature of R. solanacearum T3Es is the abun- 
dance of paralogous rip genes in all the strains sequenced 
to date. Some of these paralogous genes are well repre- 
sented in strains from the four phylotypes, hence they 
probably originated from ancient duplications in the 
common ancestor of these diverse strains. This was well 
documented for the RipGl-G8 [17] and the RipAl-A5 
[23] paralogous gene families and is probably also true 
for RipHl-H3 and RipSl-S8. Although all strains contain 
members of these paralogous family, the likely ancient 
duplications doesn't exclude some phylotype specific- 
ities explained by loss or more simply by recent dupli- 
cations e.g. RipAl and RipS6 seem to be specific to 
phylotype 1, RipG8 is only found in CMR15, the sole 
representative of phylotype 3; and RipH4 seems to be 
specific of the phylotype 4 strains (see Additional file 2). 

A second group of paralogous rip genes is characterised 
by a smaller number (2-3) of paralogous sequences in a 
given strain. Phylogenetic analyses were used to estimate 
the evolutionary relationships between paralogues using 
sequence data from the 11 RSSC representative strains. 
We defined eight additional rip genes (RipC2, RipE2, 
RipF2, Rip02, RipV2, RipAF2, RipAX2 and RipAZ2) 
(Table 1 and Additional file 4). Several of these paralo- 
gous genes, such as ripC2 or rip02, seem to differ sig- 
nificantly from RipCl and Rip02 respectively and could 
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have originated through lateral gene transfer (see below) 
since homologous genes exist in other bacterial species. 
For the gene families present in most of the RSSC 
strains {ripE2 and ripV2), the genes are located in each 
strain in a similar genomic context, an observation 
which also supports a common evolutionary origin. But 
distribution of some paralogs can be variable among 
strains: .i.e. RipEl seems to be ubiquitously present 
whereas RipE2 is absent in phylotype 1 strains. 

Protein sequence analyses indicated that RipAR, RipAW, 
Rip VI, RipV2 and RipBG contain putative ubiquitin-ligase 
domains (see below), lil<ewise, RipJ, RipK, RipAE, RipBC, 
RipPl and RipP2 could all potentially display acetyl- 
transferase activity (see phylogenetic tree in Additional 
file 5). Notwithstanding this apparent functional conser- 
vation, the sequences of these T3E genes have diverged 
significantly and can't be assigned in orthologous goups. 
It has to be noted that the numerical identification of the 
two RipPl and RipP2, and the pseudogene RipP3GMiiooo is 
used in reference to their previous names PopPl [7,36] 
(RipPl), PopP2 [20,22,44] (RipP2) and PopP3 [19]. This is 
an exception to the previous rule as we don't consider 
these to be paralogs. 

Horizontally acquired rip genes 

The detection of horizontal gene transfer (HGT hereafter) 
events in a given bacterial genome can be performed 
retrospectively through bioinformatics-based compara- 
tive analyses [45]. A frequent hallmark of genes with an 
extrinsic origin is the difference in GC content of these 
genes compared of the mean content of the host gen- 
ome [46,47]. Thirteen Rip genes exhibit a mean GC% 
below 60% (whereas the genomic mean content in RSSC 
strains is 67%) (Additional file 6). In several cases, 
the T3E gene is physically associated with insertion 
sequence elements (RipAA, RipAXl, Rip02, RipE2), 
integrases (RipAF2) or are part of prophage sequences 
integrated in the genome (RipPl, RipP2, RipT, RipAG, 
RipAX2, RipE2, RipBD). From these observations, we 
can assume that bacteriophage-mediated transfer appears 
to be an efficient mean for lateral transfer of these T3E 
in the RSSC. 

Phylogenetic analyses also provided interesting insights 
into possible HGT with other bacterial plant pathogens. 
For example, RipC2CTBP2957, outgroup of the RipCl gene 
family, could derive from the XopC T3E from Xanthomo- 
nas spp . Furthermore, the low GC content of ripC2cFBP29S7 
(61%) supports the hypothesis of an HGT, with the 
possibility of a shared common ancestor between 
ripC2cFBP29S7 and xopC. Similar observations can be 
made with Rip02i^syzygu r24 (and P. syringae pv. pha- 
seolicola HopGl), RvpKYlnsyzygu r24 (and P. syringae 
HopFl), RipEl (and P. syringae Wo^yJi and Xanthomonas 
spp. XopE), RipPl (and Xanthomonas spp. XopJ), RipAX2 



(and Xanthomonas garderni XopG and P. syringae 
HopHl) and RipH2 (and Xanthomonas sp. XopP), see 
Additional file 4. Together with RipTAL , already sus- 
pected of inter-species transfer [48,49], this analysis 
thus provided a total of seven T3E genes that could have 
been acquired through HGT. 

Evidence of phylogenetic incongruences 

Examination of the intra-family phylogenetic rela- 
tionships of T3E genes distributed in the nine RSSC 
sequenced strains revealed in some cases incongru- 
ences with the species phylogenetic tree. This can be 
illustrated by individual Rip contradicting the species 
phylogeny like RipG7cMRi5 [17], RipD cmris, RipH2_lpo82 
and RipAXlpo82, which could be indicative of rapidly 
evolving or horizontally acquired genes (Additional file 4). 
Some other conflicting phylogenies can't be directly asso- 
ciated with a single divergent gene. This is the case for 
Ripl, RipU and RipAC which are present in most of the 
RSSC sequenced strains (especially RipAC and RipU) but 
with great sequences divergence (identity at the protein 
level falling under 30% between some RipU and RipAC 
alleles). The only strong evidence for them being ortho- 
logs is the fact that RipAC and RipU genes are located 
in two highly syntenic regions with their respective 
flanking genes strictly following the species phylogeny 
(Figure 2). This suggests that RipU and RipAC evolved 
faster in some strains {e.g. RipUcMRis) resulting in this 
particular high sequence polymorphism [50]. 

Another example of discrepancy between species and 
gene phylogeny is for RipAA. Here the increased poly- 
morphism is correlated with the presence of a hypervari- 
able domain consisting of Variable Number of Tandem 
Repeats [31]. 

Several rip genes underwent selection and recombination 

After excluding from the datasets the likely pseudo- 
genes, all Rip genes with more than 3 orthologs (75 out 
of 93 Rip genes) were analysed for traces of recurrent 
diversifying positive selection. The analysis performed 
here was carried out like described previously [17], ex- 
cept that gene phylogenies were inferred using one-ratio 
codon model MO [51]. The full results are displayed in 
the Additional file 7. Considering that some of the data- 
sets were rather small we concentrated on identifying 
Rip genes with strong indications of positive selection. 
This was the case for the nine following Rip genes: 
RipAA, RipAJ, RipAT, RipAW, RipBD, RipD, RipG7, 
RipH3 and RipS7 with three out of three likelihood ra- 
tion tests (LRTs) for positive selection being significant 
(Table 2). Six out of these 9 Rip genes have an estimated 
proportion of sites under positive selection higher than 
5%, with the highest level reached for RipAJ and RipG7, 
in agreement with a previous analysis [17]. 
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Figure 2 The RipAC and RipU T3E loci are incongruence hotspots. A. Genomic map of the ripAC locus in representative strains of the four 
phylotypes from the RSSC and phylogenetic relationships of rip AC and its flanking genes. Arrows of same colour symbolize orthologous genes. 
B. Similar analysis as above for ripU. RSSC strains are color-coded according to their phylotype goup: Red for phylotype 1 and 3; Blue for 
phylotype 2 and green for phylotype 4 and related strains. 



Importantly, the presence of a high degree of recom- 
bination can hamper LRTs for positive diversifying selec- 
tion, leading to false positives [52]. However inference of 
recombination can also be affected by selection forces 
[53,54]. This is why we systematically analysed all data 
for evidence of recombination (see Additional file 7 for 
full results). Table 2 also displays the results of tests for 
recombination for the nine previously identified Rip genes. 
Among these, only two (RipAW and RipG7) could also be 
affected by recombination, while for RipAA the evidence 
of recombination is not clear-cut. The interplay between 



selection and recombination was already disentangled 
previously for RipG7 [17], with the conclusion that 
there is indeed a strong likelihood of positive selection 
acting on this gene. Here we won't address the question 
further for RipAA and RipAW but a future analysis with 
more allelic variants should be informative. 

It is interesting to note that in the multigene paralogous 
families there seems to be one member under positive 
selection: RipH3, RipS7, RipG7. When we consider only 
2 out of 3 LRTs for positive selection (see Additional 
file 7), we can define 14 more Rip coding sequences 



Table 2 Rip coding sequences under strong diversifying positive selection on the protein level 



T3E Number of Alignment Population recombination rate, LRT statistic values for codon model pairs'" Proportions of sites in different selection regimes'^ 



gene 


strains 


length (nt) 


Net (PlPt)- 


MO vs M3 


M1a vs M2a 


M7 vs M8 


M8a vs M8 


Strict negative 
(a)< 0.15) 


Relaxed negative 
(0.15 <a)<0.9) 


Neutral 
(0.9<a)<1) 


Positive 
(a»1) 


RipAA 


10 


906 


10 (0.33) 


134.9 


6.2 


14.4 


13.1 


38% (u)=0.04) 


56% (cu=048) 


0% 


6% (u)=2.9) 


RipAJ 


11 


936 


2 (0.54) 


150.5 


7.2 


11.9 


13.3 


46% (cu=0.04) 


46% (cu=049) 


9% 


7% (u)=3.1) 


RipAT 


9 


1764 


0 (0.04) 


191.1 


11.5 


17.3 


13.3 


38% (a)=0.04) 


48% (Ci)=0.54) 


10% 


5% (u)=3.3) 


Rip AW 


6 


1359 


6 (0.00) 


177.6 


94 


15.0 


20.6 


38% (a)=0.02) 


48% (a)=0.51) 


10% 


5% (cjj=4.9) 


RipAP 


7 


2400 


0 (0.22) 


148.6 


18.2 


21.7 


27.6 


59% ((J=0.02) 


29% (a)=048) 


10% 


2% (cjj=10.0) 


RipD 


11 


1971 


3 (0.18) 


266.5 


7.2 


16.7 


12.7 


38% (a)=0.04) 


47% (a)=047) 


9% 


6% (cjj=2.8) 


RipG7 


10 


2016 


10 (0.00) 


561.6 


28.1 


37.5 


43.0 


46% (a)=0.04) 


28% (a)=0.59) 


19% 


7% (u)=3.4) 


RipH3 


9 


2229 


4 (0.81) 


145.7 


19.5 


29.5 


25.1 


29% (a)=0.07) 


57% (a)=0.56) 


10% 


4% (u)=3.8) 


RipS7 


7 


9570 


0 (0.00) 


329.0 


30.4 


34.3 


42.2 


48% (cu=0.03) 


38% (a)=0.44) 


10% 


5% (u)=4.0) 



^Values supporting evidence for recombination are shown in bold. 

'^For the presented 9 genes all three LRTs for positive selection were significant, as well as the LRT comparing MO vs M3 supporting strong variability of selection pressure among sites. Codon models are as described 
in [74]. 

■^Estimates of selection regimes are according to model M8 if LRT comparing M8a and M8 was significant. Otherwise, selection regimes are reported according to model M8a. For strict and relaxed negative selection, 
the average omega value over respective selection classes is shown. Note that percentages for the four categories do not always add up to 100% due to rounding. 
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with evidence for positive selection, out of which 9 belong 
to the above-mentioned paralogous families (including 
RipA5). It is tempting to speculate that after duplications 
some of the paralogous genes could have undergone 
sub- or neo-functionalisation allowing the cognate Rip 
proteins to adapt to evolving plant targets or evade from 
host immunity. 

Comparative genomics and functional implications 

The RSSC T3E core set: a large group of conserved effectors. 

The establishment of a near-complete T3E repertoire in 
strains representative of the large phylogenetic diversity 
of the RSCC allows a more specific and accurate com- 
parison than those based on comparative genomic hybrid- 
izations [12]. We performed T3E repertoire comparisons 
using the following criteria: (i) rip genes listed as pseudo- 
genes in the database were considered non-functional but 
those listed as containing frameshifts were considered as 
functional genes. The assumption that all the frameshifts 
are due to sequencing errors is probably an overesti- 
mation. Since we can't validate this experimentally, and 
considering that the number of frameshifts identified is 
inversely correlated with the genomic sequence quality, 
we will keep this assumption. This is exemplified with 
GMIOOO and CFBP2957 high quality genomes, not con- 
taining a single frameshift mutation in their T3E genes, 
(ii) The 16 hypothetical T3E newly identified in the dif- 
ferent strains were also included in the repertoire for 
comparisons. 

The RSSC is divided in three main phylogenetic clades 
corresponding to phylotypes 2, 4 and (1 + 3) [1,11]. A 
first comparison showed that 22 Rip gene families are 
present in the 11 strains studied. When the presence 
requirement is lowered at 10 out of the 11 strains, the 
number of gene families reaches 32 (Additional file 8). 
Considering that the event of loss of specific T3E genes 
in some strain lineages is possible (see for instance the 
significantly reduced repertoire of R. syzygii R24 or 
BDBR229), we believe that these 32 T3E are a good 
estimation of the subset of T3E probably present in 
the ancestral R. solanacearum strain. Interestingly, 5 
out of 9 T3E genes families showing a strong signal of 
diversifying selection also belong to the core effector 
group (Figure 3). It is also interesting to notice that 
distinct members of paralogous Rip families (RipA, RipG 
and RipH) are also conserved among the 11 analyzed 
strains, indicating that duplications followed by differ- 
ential evolution of these genes took place early before 
phylotype divergence [17]. The estimate of 32 core T3E 
certainly reflects the abundance of T3E in the R. solana- 
cearum and, considering its genetic diversity as a species 
complex, appears significantly higher than the core list 
identified in P. syringae which is only 5 among 19 strains 
[13]. R. solanacearum ancestor presumably possessed more 



f ^ 

T3E under strong , , ^oi^ 

.. ... , ^ core T3E 

diversifying selection 




Figure 3 Grouping of T3E rip genes. Circles group (i) genes 
under strong positive selection, (see Table 2) and (ii) genes 
belonging to the core group of T3E conserved in 10 out of the 
1 1 RSSC genome sequences. 



than 20 T3E, which were possibly acquired from the 
bacterial and phage communities in the soil or aquatic 
reservoirs. 

T3E repertoire comparisons provide no clues on host 
specificity determinants 

Phylotypes 1-3, 2 and 4 are the main genetic groups 
structuring the RSSC [1,11], A comparison of the T3E 
repertoires (also taking into account the 16 candidate 
genes) from GMUOOO (Phylotype 1), CFBP2957 (phylo- 
type 2) and Psi07 (phylotype 4), representing the three 
species clades and all isolated from tomato, reveals a di- 
versity of 100 T3E genes, almost half of which (47) are 
conserved among the three strains whereas one third 
(30 T3E) appears to be strain-specific (Figure 4A). This 
confirms that a majority of T3E are widely conserved in 
this species complex but also shows that the strain rep- 
ertoires are also diversified, as observed in P. syringae 
[13] ov Xanthomonas sp. [55]. 

R. solanacearum strains exhibit great variations in 
host range [4] and it is tempting to speculate that T3E 
repertoires shape these host range capabilities. In order 
to tentatively identify candidate genes involved in host 
specificity, we performed T3E repertoire comparisons 
within specific phylogenetic groups such as phylotype 2 
or 4 using strains with marked host range differences 
(Figure 4B). These comparisons identified strain-specific 
genes but did not pinpoint strong host-specificity candi- 
dates. Indeed, none of the Molk2 specific T3E is common 
with those of the BDBR229 strain which is also patho- 
genic on banana; the same is true for potato-associated 
T3E genes from the Po82 and UW551/IPO1609 strains. 
Although more genomic sequences of RSSC strains are 
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Figure 4 T3E distribution in RSSC strains. RSSC strains are color-coded according to their phylotype goupe: Red for phylotype 1 and 3; 
Blue for phylotype 2 and green for phylotype 4 and related strains. A. Shared T3E between representative strains of the three main 
phylogenetic lineages of the RSSC, all isolated from tomato. B. Shared T3E between R. solanacearum strains belonging to phylotype 2. The almost 
identical repertoires from strains IPO1609 and UWS51 were merged for this comparison. C. Shared T3E between strains belonging to phylotype 4. 



needed to perform robust associations between host 
range and T3E repertoires, these observations already 
suggest that host-range maybe controlled by multiple 
or differential combinations of T3E determinants, or 
determinants others than T3E, or that differences in 
T3E protein sequence or gene expression might also be 
involved [10]. Similar observations were reported for 
comparison of P. syringae pathovars T3E repertoires [56], 
thus reinforcing the idea that a complex genetic basis 
underlies host range evolution in plant pathogens. 

Finally, intra-phylotype comparisons suggest that the 
proportion of conserved T3E is higher in phylotype 2 
than in phylotype 4 strains (Figure 4C). Although phylo- 
type 4 strains BDBR29 and R24 have undergone gene 
reduction potentially affecting this comparison, we still 
believe that this difference reflects the highest genetic 
diversity within phylotype 4 [9] and could also be asso- 
ciated with the diverse lifestyle among phylotype 4 
strains [11]. 



Identification of novel T3E gene harboring putative 
ubiquitin-ligase domains 

Molecular functions of most R. solanacearum T3E remain 
unknown, and more than half of the repertoire corre- 
sponds to proteins with no structural motif or domain 
suggestive of function. The search for functional motifs 
identified two T3E proteins, RipAR and RipAW, carrying 
a C-terminal domain structurally related to the Shigella 
flexneri IpaH ubiquitin ligase domain [57]. Although the 
overall similarity between IpaH and RipAR/RipAW is 
low, these R. solanacearum T3E have a C-terminal do- 
main with a predicted structure consisting of 12 alpha- 
helices as determined for IpaH family proteins [57]. Most 
of the highly conserved residues in the IpaH family, in- 
cluding a highly conserved cysteine residue essential for 
activity [57], are conserved in RipAR and RipAW see 
sequence alignment in Additional file 9. Considering 
the previously identified T3E RipV, a Salmonella SspHl 
homologue [58], and the RipG family members [16], 
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R. solanacearum potentially harbors a total of 10 T3E 
endowed with potential ubiquitin-ligase activity. This 
highlights the probable central mechanism consisting 
in subversion of the host's ubiquitination system by 
T3E during plant pathogenesis [59,60]. 

The specific case of the RipF translocon proteins 

The RSSC T3E list include RipF proteins (formerly PopF 
[35]) as substrates of the T3SS since they were identified 
as translocated into plant cells using the adenylate cy- 
clase reporter assay [6]. RipF proteins are required for 
the translocation of other T3E and are T3SS translocator 
proteins presumably acting at the tip of the Hrp pilus 
and inserting into host cell membranes to permit T3E 
translocation [35,61]. Contrary to the structural compo- 
nents of the T3SS (including the Hrp pilus structural 
pilin) which are strongly conserved among all the strains 
from the RSSC analyzed to date, a comparative analysis 
of RipF revealed major differences among the currently 
sequenced RSSC strains. Strains belonging to phylotypes 
1, 2 and 3 possess two RipF whereas strains from phylo- 
type 4 have only one (RipFl) as Xanthomonas spp. In 
phylotypes 1 and 3 the second gene, formerly named 
PopF2 [35], is phylogenetically close to the first one 
named PopFl. However in phylotype 2, the second gene 
product belongs to a distinct phylogenetic branch, sug- 
gesting an ancient divergence from the other RipFl/ 
PopFl lineage. These observations incited us to rename 
GMIIOOO PopF2 as RipFl_2 (PopFl being RipFl_l) 
whereas RipF2 is proposed to designate the gene from 
the phylotype 2 (see Figure 5). This peculiar evolutionary 
history of the RipF family makes this one of the most 
stringent discriminating probe among all Rip genes 



for distinguishing the three main phylotype groups of 
the RSSC. 

The biological implications of this gene duplication of 
the RipF translocator in some RSSC lineages and the 
structural divergence between the RipFl/RipF2 family 
members are unknown. In GMIIOOO, RipFl l has a major 
role in T3E translocation in tomato and tobacco whereas 
RipFl_2 plays a minor role in this process on these 
hosts [35]. The specific involvement of RipF2 and RipFl 
in pathogenicity of phylotype 2 strains will need to be 
addressed in future studies. 

Conclusion 

T3E are essential to R. solanacearum pathogenesis but 
progress in understanding of their relative contribution 
to disease through reverse genetic approaches has been 
hampered by the evidence of functional redundancies, 
due to the existence of large T3E repertoires. In this 
study, we have undertaken groundwork for a global in- 
ventory of R. solanacearum T3E at the species level in 
order to provide to the community a curated dataset, 
tools and a rationalized nomenclature that should pave 
the way for future work on RSSC effectomics. We con- 
ducted a large scale approach aimed at the identification, 
expert annotation and phylogenetic analyses of T3E from 
the RSSC, a species complex showing considerable 
genomic diversity [10,11] and responsible for one of 
the most devastating bacterial disease of plants worldwide 
[2]. Our search yielded a total of 94 T3E Rip genes and 16 
additional candidate T3E genes distributed among the 11 
genomes analyzed in this study. This total of more than 
100 predicted T3Es is significantly higher than the T3E 
inventories from other bacterial plant pathogens. Indeed, 
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Figure 5 Phylogenetic reconstruction of the RipF family. PhyML phylogenetic reconstruction of the RipF family. The XopF from Xanthomonas 
arboricola [GenBank: AFVSOIOS] is also included in this analysis, ft solanacearum GMIIOOO RipF1_1 and RipF1_2 correspond to former PopFl 
and PopF2. 
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in p. syringae, genome analysis of 19 phylogenetically di- 
verse isolates revealed the existence of 58 T3E genes [13] 
(the online resource www.pseudomonas-syringae.org, 
references 61 Hop orthologous groups) whereas this 
number is estimated to 52 in Xanthomonas spp [55]. 
These comparisons highlight the great diversity of T3E 
genes present in the RSSC and the apparent complexity of 
T3SS-dependent pathogenesis in this species complex. 

The RSSC T3E also appears to be highly dynamic, as 
evidenced by the number of T3E under positive selec- 
tion indicative of possible neo-functionalization or the 
number of T3E pseudogenes identified in this study. In 
particular, there is an obvious tendency to T3E gene 
decay in R. syzygii which is correlated with the genome 
reduction in this strain [11]. R. syzygii is an exception 
among the RSSC since it is strictly limited to Clover 
tree, the T3E repertoire reduction in this strain may be a 
consequence of this host specialization. On the other 
hand, the cornucopia of T3E identified in R. solanacearum 
and other related pathogenic beta-proteobacteria is 
probably a factor explaining the exceptional adaptation 
of these pathogens to such a wide diversity of hosts. Im- 
portantly, phylogenetic analyses allowed the definition 
of novel T3E genes, resulting in the definition of new 
Rip genes orthologous group or paralogs of already 
identified Rip genes. It is conceivable that these newly 
defined groups correspond to T3E genes with novel 
functional specificities. 

Our analysis should also be helpful for refined func- 
tional studies: (i) the RipFl-RipF2 translocon proteins 
appear as major discriminating determinants among the 
main lineages of the RSSC and this probably reflects a 
fundamental evolutionary divergence (ii) global compari- 
sons of repertoires among genetically diverse strains 
identified a set of 20-30 core T3E widely distributed in 
the species which could presumably be considered as 
ancestral T3E important in the interaction of the patho- 
gen with its hosts, and (iii) the identification of T3E 
displaying a positive selection pattern may provide 
hints on the determinants evolving under plant selec- 
tion pressure, (iv) our bioinformatics pipeline is dedi- 
cated to rapidly predict and assign Rip identifiers to all 
homologous T3E genes in newly sequenced strains of 
the RSSC. 

Methods 

Data sources 

General information of the features of the 14 strains of 
the RSSC and the corresponding genome sequences 
used for T3E mining is provided in Additional file 10. 
These strains are representatives of the RSSC in terms of 
host range, worldwide geographic origin and phylogen- 
etic distribution [10,11]. 



T3E inventory and annotation in RSSC genomes 

PatScan searches [62] for the hrpn box element 
(TTCGnigTTCG) were performed in RSSC genomes 
using the criteria previously used [24], i.e.: one mismatch 
allowed, considering only hits in the 500 bp region up- 
stream of a start codon. Analysis of the 50 amino acid 
N-terminal domain of candidate T3E for detection of 
T3SS-dependent export pattern was made using the 
criteria defined previously [5], which considered as posi- 
tive a N-terminal domain meeting at least two out of the 
three following rules: (i) content in Serine + Proline >30%, 
(ii) content in Leucine <10% and (iii) absence of acidic 
residues within the first twelve amino acids. 

Prediction of T3E start codon. 

We observed a great heterogeneity among the predicted 
start codons for many T3E families in the RSSC anno- 
tated genomes deposited at GenBank. When possible, 
multiple sequence alignments of the regions located down- 
stream the hrpn box element were performed to predict the 
most probable start codon which was defined as the more 
distal 5' initiator codon conserved among the different 
strain sequences. 

Frameshift and pseudogene prediction 

T3E genes were annotated as frameshift in two cases: 
(i) when several contiguous open reading frames dis- 
played homology to a defined Rip gene sequence (thus 
resulting in the annotation of two or multiple gene 
fragments), and (ii) when the T3E gene sequence was 
located on a contig border (thus resulting in the anno- 
tation of a T3E gene fragment). 

T3E genes were defined as pseudogenes in the follow- 
ing situations: (i) the structure of T3E gene was strongly 
altered with a gene size <50% to other known alleles, or 
led to the deletion of the N-terminal domain necessary 
for T3SS-dependent translocation, (ii) the T3E gene open 
reading frames was disrupted by the insertion of an IS 
element, or (iii) there was experimental evidence that 
the T3E gene product is not translocated or secreted by 
the T3SS. 

Detection of candidate effectors in sequenced genomes 
using "ScanYourGenome" 

The first step of the pipeline we developed to detect 
putative effector candidates is a de novo proteome pre- 
diction. To achieve this, we run a blastx of the genome 
against the T3E proteins and use this data as an input 
of the prokaryotic gene predictor FrameD [63]. This 
tool is run twice with the T3E nucleic coding sequence 
as model: the first pass is done with a high frameshift 
penalty score and the second one with a lower one, 
allowing frameshift and pseudogene prediction. To en- 
sure the completeness of this new effectome, we add 
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translated regions matching a T3E member according 
to the blastx results. 

The second step of the pipeline is the search of homolo- 
gous T3E member for each candidate. In order to get the 
best precision, we run different methods and synthesise 
information taking into account the specificity of each 
method and parameters. 

The first method is the search for homology using a 
modified version of OrthoMCL [64] pipeline. The modi- 
fications used are: filter inactivation in the blastp pre- 
process with default parameters and stepwise decrease 
of the percent match cutoff (from 90% to 60%) in ortho- 
log clustering in order to retrieve shorter pseudogene. 
The best blastp, hmmscan and tblastn are respectively 
kept in order to complete orthoMCL assignation or to 
remove ambiguity of multiple assignations, especially in 
the case of paralogous gene families. 

The results are ordered according to the stringency 
of the method (from OrthoMCL90 > OrthoMCLSO > 
OrthoMCL70 > OrthoMCL60 > blastp > HMMscan > tblastn). 
It is also indicated whether a frameshift mutation was 
introduced to produce a better homologous sequence. If 
the candidate gene is shorter than 80% of the average 
length of the cognate Rip gene, then the gene is tagged 
as a candidate. 

This pipeline, written in Perl, is available through the 
T3E web interface and all parameters are available on 
demand. 

Phylogeny 

Rip sequences were aligned using the ProGraphMSA pro- 
gram, which implements the evolution-aware alignment 
[65,66]. This program performs well with indel rich data 
as well as with variation in tandem repeats such as leucine 
rich repeats, as is often the case here. All phylogenies were 
reconstructed using fast maximum likelihood (ML) heur- 
istic search. For all individual Rip genes we captured infor- 
mation from both nonsynonymous and synonymous sites 
by using tree searches under codon model MO [67] using 
CodonPhyML [51]. 

Since phylogenies for paralogous gene families described 
much more diverse datasets, they were reconstructed 
under amino acid model LG [68] with C-rate variation 
among sites [69], as implemented in PhyMLvS.O [70]. 
Branch supports were estimated using the aBayes method, 
which is fast, accurate and has performance comparable 
with the Bayesian method [71]. Phylogenetic trees were 
produced using the online software ITOL [72]. 

Analysis of selection pressures 

Selection pressures were analysed on T3E genes datasets 
containing three or more orthologs. Selection pressures 
on T3E genes were evaluated using Markov models of 
codon substitution, and three pairs of likelihood ratio 



tests (LRTs) were used to detect positive selection like 
previously described [17]. 

Testing for recombination 

The same data used for the selection pressure analysis 
were used to estimate the population recombination rates 
using the approximate-likelihood coalescent method and 
permutation test [73] like previously described [17]. 

Availability of supporting data 

All the data present in this work and supporting our 
analysis is available on the publicly accessible database 
that has been set up and will be maintained by us. 

https://iant.toulouse.inra.fr/T3E is a website designed 
to provide the user with a convenient and straightfor- 
ward access to all the underlying data. 

GenBank accessions 

Out of the 841 Ralstonia solanacearum accessions used 
in this study, we have submitted 42 new and proposed 
the modification of the annotation of 289 other individual 
T3E gene accessions to GenBank. All the Genbank ac- 
cessions appear on the database webpage (under data/ 
supplementary data and also as Additional file 11. 

Additional files 



Abbreviations 

BDB: Blood disease bacterium; HGT: Horizontal gene transfer; IS: Insertion 
sequence; LRT: Lil<elilnood ratio test; ML: Maximum likelihood; RIP: Ralstonia 
injected protein; RSSC: Ralstonia solanacearum species complex; T3E: Type II 
effector; T3SS: Type III secretion system. 

Competing interests 

The authors declare no competing interests. 



Additional file 1: Table displaying the additional 16 T3E candidates 
in the RSSC. 

Additional file 2: List of T3E genes identified in the 1 1 strains of 
the RSSC used in this study. The result of "ScanYourGenome" on three 
additional strains (K60, FQY_4 and Y45) is also presented. 

Additional file 3: Experimental validation of type III dependent 
secretion of RipAM. 

Additional file 4: Phylogenetic reconstruction for all paralogous 
T3E genes together with selected homologs from other bacteria. 

Additional file 5: Phylogenetic tree reconstruction of T3E with 
proven (YopJ, RipP2Giviriooo) and possible acetyl-transferase activity. 

Additional file 6: List of T3E orthologues with GC% bias and 
association with mobile elements. 

Additional file 7: Table displaying the calculated positive selection 
and recombination probabilities for the whole T3E dataset. 

Additional file 8: List of the 32 core T3E presented in this study. 

Additional file 9: Sequence alignment of RipAR and RipAW C-terminal 
domains with IpaH ubiquitin ligases. 

Additional file 10: List and features of RSSC strains and 
corresponding genomic sequences used in this study. 

Additional file 11: List of all accessions used in this work. 



Peelers et at. BMC Genomics 2013, 14:859 
http://www.biomedcentral.com/1471-2164/14/859 



Page 17 of 18 



Authors' contributions 

NP, SC and SG designed tine study. SC performed and structured all tiie 
bioinformatics pipeline and database; MA performed tite selection and 
recombination analysis as well as the phylogenetic reconstructions. NP, LP, 
ACC and SG participated in the curation of the data. NP, MA, SG analysed 
the data. NP and SG wrote de paper All authors have read and approved 
the manuscript for publication. 

Acl<nowledgments 

We thank Jerome Gouzy for advices and discussions. This work was 
supported by funds from the "Laboratoire d'Excellence" (LABEX) entitled 
TULIP (ANR-lO-LABX-41) and grant 31003A_127325 from the Swiss National 
Science Foundation to M.A. 

Author details 

'INRA, Laboratoire des Interactions Plantes-Microorganismes (LIPM), UMR441, 
F-31326 Castanet-Tolosan, France. ^CNRS, Laboratoire des Interactions 
Plantes-IVIicroorganismes (LIPM), UMR2594, F-31326 Castanet-Tolosan, France. 
^Department of Computer Science, ETH Zurich, Zurich, Switzerland. '*Swiss 
Institute of Bioinformatics, Lausanne, Switzerland. ^Biozentrum, Department 
Biologic I, Ber. Mikrobiologie, Ludwig-Maximilians Universitat Muenchen, 
Grosshaderner Str. 2-4, 82152 Martiensried, Germany. 

Received: 24 July 2013 Accepted: 29 November 2013 
Published: 6 December 2013 

References 

1. Peeters N, Guidot A, Vailleau F, Vails IVl: Ralstonia solanacearum, a 
widespread bacterial plant pathogen in the post-genomic era. Mo/ Plant 

Pathol 2013, 14:651-662. 

2. IVlansfield J, Genin S, Magori S, Citovsky V, Sriariyanum IVl, Ronald P, Dow M, 
Verdier V, Beer SV, Machado MA, Toth I, Salmond G, Foster GD: Top 10 
plant pathogenic bacteria in molecular plant pathology. Mol Plant Pathol 
2012, 13:614-629. 

3. Elphinstone JG: The Current Bacterial Wilt Situation: A Global Overview. 
In Bacf Wilt DIs Ralstonia Solanacearum Species Complex. Edited by Allen C, 
Prior P, Hayward AC. St Paul, MN, USA: APS Press; 2005:9-28. 

4. Genin S: Molecular traits controlling host range and adaptation to plants 
in Ralstonia solanacearum. New Phytol 2010, 187:920-928. 

5. Cunnac S, Occhialini A, Barberis P, Boucher C, Genin S: Inventory and 
functional analysis of the large Hrp regulon in Ralstonia solanacearum: 
identification of novel effector proteins translocated to plant host cells 
through the type III secretion system. Mol Microbiol 2004, 53:1 15-128. 

6. IVlukaihara T, Tamura N, Iwabuchi IVl: Genome-wide identification of a large 
repertoire of Ralstonia solanacearum type III effector proteins by a new 
functional screen. Mol Plant Microbe Interactions MPMIs 201 0, 23:251 -262. 

7. Poueymiro M, Genin S: Secreted proteins from Ralstonia solanacearum: a 
hundred tricks to kill a plant. Curr Opin Microbiol 2009, 12:44-52. 

8. Fegan M, Prior P: How Complex is the "Ralstonia Solanacearum Species 
Complex. In Socf Wilt DIs Ralstonia Solanacearum Species Complex. Edited 
by Allen C Prior P Hayward AC. St Paul, MN, USA: APS Press; 2005:449-461. 

9. V\/icker E, Lefeuvre P, de Cambiaire J-C, Lemaire C, Poussier S, Prior P: 
Contrasting recombination patterns and demographic histories of the 
plant pathogen Ralstonia solanacearum inferred from MLSA. ISME J 
2012, 6:961-974 

10. Genin S, Denny TP: Pathogenomics of the Ralstonia solanacearum species 
complex. Anna Rev Phytopathol 2012, 50:67-89. 

11. Remenant B, de Cambiaire J-C, Cellier G, Jacobs JIVI, Mangenot S, Barbe V, 
Lajus A, Vallenet D, Medigue C, Fegan M, Allen C, Prior P: Ralstonia syzygii, 
the blood disease bacterium and some Asian R. Solanacearum strains 
form a single genomic species despite divergent lifestyles. PLoS One 
2011,6:e24356. 

12. Guidot A, Prior P, Schoenfeld J, Carrere S, Genin S, Boucher C: Genomic 
structure and phylogeny of the plant pathogen Ralstonia solanacearum 
inferred from gene distribution analysis. J Bacterid 2007, 189:377-387. 

13. Baltrus DA, Nishimura MT Romanchuk A, Chang JH, Mukhtar MS, Cherkis K, 
Roach J, Grant SR, Jones CD, DangI JL: Dynamic evolution of pathogenicity 
revealed by sequencing and comparative genomics of 19 Pseudomonas 
syringae isolates. PLoS Pathog 201 1, 7:el002132. 

14. Flajri A, Brin C, Hunault G, Lardeux F, Lemaire C, Manceau C, Boureau T, 
Poussier S: A "repertoire for repertoire" hypothesis: repertoires of type 



three effectors are candidate determinants of host specificity in 
Xanthomonas. PLoS One 2009, 4:e6632. 

15. Mukaihara T, Tamura N: Identification of novel Ralstonia solanacearum 
type III effector proteins through translocation analysis of hrpB-regulated 
gene products. Microbiol Read Engl 2009, 155(Pt 7):2235-2244. 

16. Angot A, Peeters N, Lechner E, Vailleau F, Baud C, Gentzbittel L, Sartorel E 
Genschik P, Boucher C, Genin S: Ralstonia solanacearum requires F-box-like 
domain-containing type III effectors to promote disease on several host 
plants. Proc Natl Acad Scl USA 2006, 1 03:14620-14625. 

17. Remlgi P, Anisimova M, Guidot A, Genin S, Peeters N: Functional 
diversification of the GALA type III effector family contributes to 
Ralstonia solanacearum adaptation on different plant hosts. New Phytol 
2011, 192:976-987 

18. Kajava AV, Anisimova M, Peeters N: Origin and evolution of GALA-LRR, a 
new member of the CC-LRR subfamily: from plants to bacteria? PLoS One 
2008, 3:el694 

19. Lavie M, Seunes B, Prior P, Boucher C: Distribution and sequence analysis 
of a family of type ill-dependent effectors correlate with the phylogeny 
of Ralstonia solanacearum strains. Mol Plant Microbe Interactions MPMI 
2004 17:931-940. 

20. Deslandes L, Olivier J, Peeters N, Feng DX, Khounlotham M, Boucher C 
Somsslch 1, Genin S, Marco Y: Physical interaction between RRS1-R, a protein 
conferring resistance to bacterial wilt, and PopP2, a type III effector 
targeted to the plant nucleus. Proc Natl Acad Scl USA 2003, 100:8024-8029, 

21. Deslandes L, Olivier J, Theulleres F, Hlrsch J, Feng DX, Bittner-Eddy P, 
Beynon J, Marco Y: Resistance to Ralstonia solanacearum in Arabidopsis 
thaliana is conferred by the recessive RRS1-R gene, a member of a novel 
family of resistance genes. Proc Natl Acad Scl USA 2002, 99:2404-2409. 

22. Tasset C, Bernoux M, Jauneau A, Pouzet C, Briere C, Kieffer-Jacqulnod S, 
Rivas S, Marco Y, Deslandes L: Autoacetylation of the Ralstonia solanacearum 
effector PopP2 targets a lysine residue essential for RRSl-R-mediated 
immunity in Arabidopsis. PLoS Pathog 2010 6:el001202. 

23. Sole M, Popa C Mith 0, Sohn KH, Jones JDG, Deslandes L, Vails M: The awr 
gene family encodes a novel class of Ralstonia solanacearum type III 
effectors displaying virulence and avirulence activities. Mol Plant Microbe 
Interactions MPMI 2012, 25:941-953. 

24. Cunnac S, Boucher C, Genin S: Characterization of the cis-acting regulatory 
element controlling HrpB-mediated activation of the type III secretion 
system and effector genes in Ralstonia solanacearum. J Bacterial 2004, 
186:2309-2318 

25. Sharma V, Firth AE Antonov I, Fayet 0, Atkins JF, Borodovsky M, Baranov PV: 
A pilot study of bacterial genes with disrupted ORFs reveals a surprising 
profusion of protein sequence recoding mediated by ribosomal 
frameshifting and transcriptional realignment. Mol Biol Evol 201 1, 
28:3195-3211. 

26. Edgar RC: MUSCLE: a multiple sequence alignment method with reduced 
time and space complexity. BMC Biolnforma 2004, 5:1 13. 

27. Mukaihara T, Tamura N, Murata Y, Iwabuchi M: Genetic screening of Hrp 
type Ill-related pathogenicity genes controlled by the HrpB transcrip- 
tional activator in Ralstonia solanacearum. Mol Microbiol 2004, 
54:863-875. 

28. Llndeberg M, Stavrinides J, Chang JH, Alfano JR, Collmer A, Dangl JU 
Greenberg JT, Mansfield JW, Guttman DS: Proposed guidelines for a 
unified nomenclature and phylogenetic analysis of type III Hop effector 
proteins in the plant pathogen Pseudomonas syringae. Mol Plant Microbe 
Interactions MPMI 2005, 18:275-282. 

29. Lemolne F, Lespinet 0, Labedan B: Assessing the evolutionary rate of 
positional orthologous genes in prokaryotes using synteny data. BMC Evol 
Biol 2007, 7:237. 

30. Krlstensen DM, Wolf Yl, Musheglan AR Koonln EV: Computational methods 
for gene orthology inference. Brief Biolnform 201 1, 12:379-391. 

31. Poueymiro M, Cunnac S, Barberis P, Deslandes L, Peeters N, Cazale-Noel A-C, 
Boucher C, Genin S: Two type III secretion system effectors from Ralstonia 
solanacearum GMIIOOO determine host-range specificity on tobacco. 
Mol Plant Microbe Interactions MPMI 2009, 22:538-550. 

32. Arlat M, Van Gljsegem F, Huet JC, Pernollet JC, Boucher CA: PopAl, a 
protein which induces a hypersensitivity-like response on specific Petunia 
genotypes, is secreted via the Hrp pathway of Pseudomonas solanacearum. 
fMSOJ 1994 13:543-553. 

33. Gueneron M, Tlmmers AC, Boucher C, Arlat M: Two novel proteins, PopB, 
which has functional nuclear localization signals, and PopC, which has a 



Peelers et at. BMC Genomics 2013, 14:859 
http://www.biomedcentral.com/1471-2164/14/859 



Page 18 of 18 



large leucine-rich repeat domain, are secreted through the hrp-secretion 
apparatus of Ralstonia solanacearum. Moi Microbiol 2000, 36:261-277. 

34. Li J-G, Liu H-X, Cao J, Chen L-F, Gu C, Allen C, Guo J-H: PopW of Ralstonia 
solanacearum, a new two-domain harpin targeting the plant cell wall. 
Moi Plant Pathol 201 0, 1 1 :371 -381 . 

35. Meyer D, Cunnac S, Gueneron M, Declercq C, Van Gijsegem F, Lauber E, 
Boucher C, Arlat M: PopFl and PopF2, two proteins secreted by the type 
III protein secretion system of Ralstonia solanacearum, are translocators 
belonging to the HrpF/NopX family. J Bacteriol 2006, 1 88:4903-491 7. 

36. Lavie M, Shillington E, Eguiluz C, Grimsley N, Boucher C: PopPI, a new 
member of the YopJ/AvrRxv family of type III effector proteins, acts as a 
host-specificity factor and modulates aggressiveness of Ralstonia 
solanacearum. Moi Plant Microbe Interactions MPMI 2002, 15:1058-1068. 

37. Carney BF, Denny TP: A cloned avirulence gene from Pseudomonas 
solanacearum determines incompatibility on Nicotiana tabacum at the 
host species level. J Bacteriol 1 990, 1 72:4836-4843. 

38. Yabuuchi E, Kosako Y, Yano I, Hotta H, Nishiuchi Y Transfer of two 
Burkholderia and an alcaligenes species to ralstonia gen. Nov.: proposal 
of ralstonia pickettii (Ralston, palleroni and doudoroff 1973) comb. Nov., 
ralstonia solanacearum (smith 1896) comb. Nov. And ralstonia eutropha 
(Davis 1969) comb. Nov. Microbiol Immunol 1995, 39:897-904. 

39. Goure J, Pastor A, Faudry E, Chabert J, Dessen A, Attree I: The V antigen of 
Pseudomonas aeruginosa is required for assembly of the functional 
PopB/PopD translocation pore in host cell membranes. Infect Immun 
2004 72:4741-4750. 

40. Schesser K, Dukuzumuremyi JM, Cilio C, Borg S, Wallis TS, Pettersson S, 
Galyov EE: The salmonella YopJ-homologue AvrA does not possess 
YopJ-like activity. Microb Pathog 2000, 28:59-70. 

41. Remenant B, Babujee L, Lajus A, Medigue C, Prior P, Allen C: Sequencing 
of K60, type strain of the major plant pathogen Ralstonia solanacearum. 
J Bacteriol 2012, 194:2742-2743. 

42. Cao Y Tian B, Liu Y Cai U Wang H, Lu N, Wang M, Shang S, Luo Z, Shi J: 
Genome sequencing of ralstonia solanacearum FQY_4, isolated from a 
bacterial wilt nursery used for breeding crop resistance. Genome Announc 
2013, 1:e00125-13. 

43. Li Z, Wu S, Bai X, Liu Y Lu J, Liu Y Xiao B, Lu X, Fan L: Genome sequence 
of the tobacco bacterial wilt pathogen Ralstonia solanacearum. J Bacteriol 
2011, 193:6088-6089. 

44. Bernoux M, Timmers T, Jauneau A, Briere C, de Wit PIGM, Marco Y, 
Deslandes L: RD19, an Arabidopsis cysteine protease required for 
RRSl -R-mediated resistance, is relocalized to the nucleus by the Ralstonia 
solanacearum PopP2 effector. Plant Cell 2008, 20:2252-2264. 

45. Didelot X, Maiden MCI: Impact of recombination on bacterial evolution. 
Trends Microbiol 2010, 18:315-322. 

46. Dufraigne C, Fertil B, Lespinats S, Giron A, Deschavanne P: Detection and 
characterization of horizontal transfers in prokaryotes using genomic 
signature. Nucleic Acids Res 2005, 33:e6. 

47. Kado CI: Horizontal gene transfer: sustaining pathogenicity and 
optimizing host-pathogen interactions. Mo/ Plant Pathol 2009, 10:143-150. 

48. De Lange 0, Schreiber T, Schandry N, Radeck J, Braun KH, Koszinowski J, 
Heuer H, Strauss A, Lahaye T: Breaking the DNA-binding code of Ralstonia 
solanacearum TAL effectors provides new possibilities to generate plant 
resistance genes against bacterial wilt disease. New Phytol 2013, 199:773-786. 

49. Fall S, Mercier A, Bertolla F, Calteau A, Gueguen U Perriere G, Vogel TM, 
Simonet P: Horizontal gene transfer regulation in bacteria as a "spandrel" 
of DNA repair mechanisms. PLoS One 2007, 2:el055. 

50. Wolf Yl, Koonin EV: A tight link between orthologs and bidirectional best 
hits in bacterial and archaeal genomes. Genome Biol Evol 2012, 4:1286-1294. 

51. Gil M, Zanetti MS, Zoller S, Anisimova M: CodonPhyML: fast maximum 
likelihood phylogeny estimation under codon substitution models. 
Moi Biol Evol 201 3, 30:1 270-1 280. 

52. Anisimova M, Nielsen R, Yang Z: Effect of recombination on the accuracy 
of the likelihood method for detecting positive selection at amino acid 
sites. Genetics 2003, 164:1229-1236. 

53. Reed FA, Tishkoff SA: Positive selection can create false hotspots of 
recombination. Genetics 2006, 172:201 1-2014 

54. O'Reilly PF, Birney E, Balding DJ: Confounding between recombination 
and selection, and the Ped/Pop method for detecting selection. 
Genome Res 2008, 18:1304-1313. 



55. Ryan RP, Vorholter F-J, Potnis N, Jones JB, Van Sluys M-A, Bogdanove AJ, 
Dow JM: Pathogenomics of Xanthomonas: understanding bacterium- 
plant interactions. Nat Rev Microbiol 201 1, 9:344-355. 

56. Baltrus DA, Nishimura MI, Dougherty KM, Biswas S, Mukhtar MS, Vicente J, 
Holub EB, DangI JL: The molecular basis of host specialization in bean 
pathovars of Pseudomonas syringae. Moi Plant Microbe Interactions MPMI 
2012, 25:877-888. 

57. Singer AU, Rohde JR, Lam R, Skarina T, Kagan 0, Dileo R Chirgadze NY 
Cuff ME, Joachimiak A, Tyers M, Sansonetti PJ, Parsot C, Savchenko A: 
Structure of the Shigella T3SS effector IpaH defines a new class of E3 
ubiquitin ligases. Nof Struct Mo/ S/o/ 2008, 15:1293-1301. 

58. Rohde JR, Breitkreutz A, Chenal A, Sansonetti PJ, Parsot C: Type III secretion 
effectors of the IpaH family are E3 ubiquitin ligases. Cell Host Microbe 
2007, 1 :77-83. 

59. Dudler R: Manipulation of Host Proteosomes as a Virulence Mechanism 
of Plant Pathogens. Annu Rev Phytopathol 2013, 51:521-542. 

60. Angot A, Vergunst A, Genin S, Peelers N: Exploitation of eukaryotic 
ubiquitin signaling pathways by effectors translocated by bacterial type 
III and type IV secretion systems. PLoS Pathog 2007, 3:e3. 

61 . Buttner D, Nennstiel D, Klusener B, Bonas U: Functional analysis of HrpF, a 
putative type III translocon protein from Xanthomonas campestris pv. 
vesicatoria. J Bacteriol 2002, 1 84:2389-2398. 

62. Dsouza M, Larsen N, Overbeek R: Searching for patterns in genomic data. 
Trends Genet TIG 1997, 13:497-498. 

63. Schiex T, Gouzy J, Moisan A, de Oliveira Y: FrameD: a flexible program for 
quality check and gene prediction in prokaryotic genomes and noisy 
matured eukaryotic sequences. Nucleic Acids Res 2003, 31:3738-3741. 

64. Li L, Stoeckert CJ Jr, Roos DS: OrthoMCL: identification of ortholog groups 
for eukaryotic genomes. Genome Res 2003, 13:2178-2189. 

65. Szalkowski AM, Anisimova M: Markov models of amino acid substitution 
to study proteins with intrinsically disordered regions. PLoS One 201 1, 
6:e20488. 

66. Szalkowski A, Anisimova M: Graph-based modeling of tandem repeats 
improves global multiple sequence alignment. Nucleic Acids Res 2013, 
41:el62. 

67. Goldman N, Yang Z: A codon-based model of nucleotide substitution for 
protein-coding DNA sequences. Moi Biol Evol 1994, 1 1:725-736. 

68. Le SQ, Gascuel 0: An improved general amino acid replacement matrix. 
Moi Biol Evol 2008, 25:1 307-1 320. 

69. Yang Z: Maximum likelihood phylogenetic estimation from DNA 
sequences with variable rates over sites: approximate methods. J Moi 
Evol ] 994, 39:306-314 

70. Guindon S, Dufayard J-F, Lefort V, Anisimova M, Hordijk W, Gascuel 0: New 
algorithms and methods to estimate maximum-likelihood phylogenies: 
assessing the performance of PhyML 3.0. Sysf Biol 2010, 59:307-321. 

71 . Anisimova M, Gil M, Dufayard J-F, Dessimoz C, Gascuel 0: Survey of branch 
support methods demonstrates accuracy, power, and robustness of fast 
likelihood-based approximation schemes. Syst Biol 201 1, 60:685-699. 

72. Letunic I, Bork P: Interactive Tree Of Life v2: online annotation and 
display of phylogenetic trees made easy. Nucleic Acids Res 201 1, 
39:W475-W478 (Web Server issue). 

73. McVean G, Awadalla P, Fearnhead P: A coalescent-based method for 
detecting and estimating recombination from gene sequences. 
Genetics 2002, 160:1231-1241. 

74. Yang Z, Nielsen R, Goldman N, Pedersen AM: Codon-substitution models 
for heterogeneous selection pressure at amino acid sites. Genetics 2000, 
155:431-449. 



doi:10.1 186/1471-2164-14-859 

Cite this article as: Peelers et ai: Repertoire, unified nomenclature and 
evolution of the Type III effector gene set in the Ralstonia solanacearum 
species complex. BMC Genomics 2013 14:859. 



